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We apply the Wako-Saito-Munoz-Eaton model to the study of Myotrophin, a small 

ankyrin repeat protein, whose folding equilibrium and kinetics have been recently 

characterized experimentally. The model, which is a native-centric with binary vari- 

PQ: ables, provides a finer microscopic detail than the Ising model, that has been recently 

O 

; applied to some different repeat proteins, while being still amenable for an exact 

qh- solution. In partial agreement with the experiments, our results reveal a weakly 

three-state equilibrium and a two-state-like kinetics of the wild type protein despite 
^ ■ the presence of a non-trivial free-energy profile. These features appears to be related 



to a careful "design" of the free-energy landscape, so that mutations can alter this 
picture, stabilizing some intermediates and changing the position of the rate-limiting 



m 

step. Also the experimental findings of two alternative pathways, an N-terminal 
and a C-terminal one, are qualitatively confirmed, even if the variations in the rates 
upon the experimental mutations cannot be quantitatively reproduced. Interestingly, 

X - 

5h ■ folding and unfolding pathway appear to be different, even if closely related: a prop- 

er ■ 

erty that is not generally considered in the phenomenological interpretation of the 
experimental data. 
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I. INTRODUCTION 



In the last decade, repeat proteins have increasingly drawn the attention of researchers, 
due to their ubiquity, their abundance, and the fact that they provide a different folding 
paradigm with respect to the well known one of globular proteins, where complex native 
state geometries, characterized by local and nonlocal interactions, are most often associated 
to a simple two-state equilibrium and kinetics. On the contrary, repeat proteins are charac- 
terized by tandem arrays of the same structural motif (even if individual repeats show just 
partial sequence identity, typically, around 25%i). Such motifs are usually arranged in a lin- 
ear fashion, giving rise to elongated structures that may consist of a highly variable number 
of repeats. Interactions in such modular structures take place within a repeat and between 
adjacent repeats, while truly non-local interactions connecting non-contiguous repeats are 
lacking. While such organization provide a general-purpose scaffold that can be tuned to 
bind different species, it is quite suprising that it is still compatible with a cooperative, 
two-state folding. Indeed, recent experimental studies have revealed that repeat proteins 
typically show a two-state equilibrium but a multistate kinetics^, driving the attention on 
the existence of different folding pathways. From a theoretical point of view, repeat pro- 
teins provide an ideal framework for modeling and hypothesis-testing, due to their structural 
modularity, and to the fact that artificial molecules can be built from consensus sequences, 
so that the role of the different interactions and of the chain length can be dissected and ana- 
lyzed individually. Not surprisingly, the classical Ising model from statistical mechanics has 
been used to describe these almost linear systems with local nearest-neighbour interactions, 
where the spin variables have been identified with individual helices within a repeat^, or with 
entire repeats^, or with the elementary "foldons" identified in a more detailed molecular dy- 
namics simulation^. Typically, the external fields and neighbour interaction parameters (hi 
and Jj ; j+i respectively, in their typical textbook denominations) are derived from the ex- 
perimental analysis of the stability of constructs of different length, and are related to the 
variation of the areas accessible to the solvent in the folding process. The identification of 
the elementary spin variable with a piece of structure as a whole, hinders the possibility to 
investigate the detailed role of individual contacts between the residues, and of studying the 
origin of the cooperativity and multistate kinetics on the residue scale. 

Here, we use the Wako-Saito-Munoz-Eaton (WSME) model^-, where the state of each 
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residue i is described by a binary variable m« = 0, 1, representing the unfolded and native 
state, respectively. Formally, the model differs from the Ising one in that the interactions 
are not limited to next neighbours, but extend to any distance, provided that the variables 
corresponding to all the intervening residues are set to the native state. The model equilib- 
rium can be exactly calculated^^*^, so that energies, free-energies, and fractions of native 
residues can be easily evaluated. The folding and unfolding kinetics are studied through 
Monte Carlo simulation, with an elementary step corresponding to the folding/unfolding of 
one residue. 

The model has been applied to describe the folding of many proteins^— , and also to the 
study of force-induced denaturation of proteins and RNA— r — . We apply the WSME model 
to the study of Myotrophin, a 118 residues protein, ubiquitously expressed in all mammalian 
tissues^ - — , made up of four ankyrin repeats. Its equilibrium has been characterized exper- 
imentally as two-state by Peng and coworkers^ with thermal and chemical denaturation 
experiments, and later confirmed as such, at least as far as chemical denaturations is con- 
cerned, by Lowe and Itzhaki^i, that also studied the kinetics^i^. In the former paper, the 
authors propose an effective two-state framework to interpret the relaxation kinetics^ (more 
precisely, they actually observe some curvature in the unfolding arm of the chevron plot, 
that can be explained by postulating either a barrier shift or the existence of a high energy 
intermediate of negligible population). 

An extended analysis on several mutants leads them to conclude that, in order to explain 
within a unique framework the behavior of both the wild type and the mutants, pathway 
heterogeneity must be assumed, with the dominant pathway presenting a high energy in- 
termediate, which is lacking in the secondary one^. Even if their analysis contains several 
simplifying assumptions (for instance, the fact that the relaxation rate is just the sum of the 
rates along the two pathways) they are able to provide very good fits to the experimental 
data, and to determine that the two pathways present different nucleation sites, on the N- 
terminal or on the C-terminal part of the protein, respectively. Finally, they show how, by 
combining mutations, it is possible to make the protein switch between the two pathways. 

After fitting the model parameters to reproduce the fraction of native protein as a function 
of the denaturant concentration derived from experiments, we calculate the free energy 
profiles and relaxation rates, and characterize the relaxation pathways, at low and high 
denaturant concentrations, for the wild type protein and for a series of mutants, selected to 
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probe different regions and contact distances. We also simulate the set of mutations used in 
Ref.— , to test the double pathway hypothesis. 

Our goal is to reproduce, at least qualitatively, the experimental behavior, and to shed 
light on the nature of the folding nuclei, as well as to recover the role of "pathway switch" 
played by some mutations. Moreover, we want to clarify the different role played by muta- 
tions affecting local or non-local contacts in the same region. 

II. METHODS 
Model 

WSME is a native-centric model^, i.e. it relies on the knowledge of the native state of 
a protein to describe its equilibrium and kinetics. Its binary variables m^, accounting for 
the local backbone and side chain angles, describe the state of each residue k G [1, N] as 
ordered (native, = 1) and disordered (unfolded, = 0). Since the latter state allows a 
much larger number of microscopic realizations than the former, an entropic cost q k is given 
to the ordering of residue k. 

The model is described by the effective hamiltonian (indeed, a free energy, where the 
solvent and the fast degrees of freedom have been integrated out): 

N-l N j N 

H = - ^ e « A ^' II m k + X^ fcT + ac)m k) (1) 

i=l j=i+l k=i k=l 

where N is the number of residues in the molecule and T the absolute temperature. The 
product ni=i m fc takes value 1 if and only if all the peptide bonds from % to j are in the native 
state, thereby realizing the assumed interaction. Non-native interactions are disregarded, 
while native interactions are accounted for in the contact matrix Aij, which counts the 
number of contacts between atoms of non-contiguous residues % and j in the native structure, 
according to a cutoff distance criterion. In the following, we will use the contact map 
calculated from the crystal structure of Myotrophin deposited in the Protein Data Bank 
(PDB code: 2myo), considering that a contact is established if any two atoms (including 
hydrogens) from residues % and j are found at a distance less than 3.5 A. Figure [1] reports 
the resulting contact map. 

The expression above differs from the original one for the last term, accounting for the 
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FIG. 1. Weighted contact map for myotrophin (2myo). The area of each circle is proportional 
to the weight of the contact between residues i and j, in terms of the number of interatomic 
interactions (see text for details). Darker areas represent contacts between residues that belong to 
helices. 

interaction between the denaturant and the protein backbone (as suggested by Bolen and 
coworkers^, and also in agreement with the choice in Ref.-), where c represents the urea 
molar concentration and a is a new parameter. For the sake of simplicity, we take homo- 
geneous parameters e^j = e, $ = q, for each i and j, with e, q > 0, to model the wild type 
protein. We use the experimental thermodynamics data to adjust the parameters: we set 
q = 2R and find e, and a by fitting the equilibrium experimental data of Refs.— ^ for the 
wild type species. We first calculate the native and unfolded baselines n(z) and u(z), where 
z is the temperature or denaturant concentration, from a linear interpolation of the data far 
from the transition region. Then, we consider the order parameter: 



normalized between zero (unfolded) and one (native). Here m(z) is the equilibrium average 
fraction of folded residues, defined in Eq. ([3]). Then, we adjust the e parameter, imposing 
that the temperature T m at which p(T m ) = 0.5 coincides with the experimental mid-folding 
temperature T m = 327 K . Then, we do the same for the a parameter, imposing that 
p{cm) — 0.5 at the experimental mid-folding denaturant molar concentration c m = 3.2. 
The resulting values of the parameters are used in the whole study, for both wild type and 
mutated species. 

The results are reported in Fig. [2j 

Mutations are mimicked by perturbing a group of contacts, of one or more residues 
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FIG. 2. Fit of the order parameter p(z) from the WSME model to the experimental data for (a) 
urea-induced denaturation^i and (b) thermal denaturation2£. Parameters values are: e = 0.2276 
kj/mol, q = 0.0166 kJ/(K mol), a = 0.208 kJ/([urea] mol) 

as detailed below, through the addition of a Ae^j to the corresponding interactions. To 
make comparison easier, the same total perturbation of e toi =9.21 kJ/mol, comparable with 
those reported in Ref.—, is introduced for all mutants, so that the Aqj will vary between 
mutants, according to the number of affected contacts n ct : Ae^- = Stat/^d- The list of 
analyzed mutations is reported in Table [U and was selected to probe different regions of 
residues and different distances between contacting residues. 

We also analyze the case of some multiple mutations that have been investigated exper- 
imentally in Ref.— to test the two-pathway interpretation. In order to compare our model 
to those results, we have simulated the effect of those mutations applying a stabilizing or 
destabilizing perturbation (whose energy is taken from Ref.—) equally spread on all the 
contacts of the mutated residues. The following mutants are considered 

• E17V/D20L (AAGcq. = -7.12 - 2.09 kJ/mol C: a stabilized mutant) 





A9G (AAGe q . = 10.38 kJ/mol C) 
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name 


description 


WT 


wild type protein 




51,2 


contacts of residues [5. . 


. 10] with residues [17,18] 


S3 


contacts of residue 32 




53,4 


contacts of residues [36. 


. . 44] with residues [49. . . 56] 


<Sl,4 


contacts of residues [9. . 


. 18] with residues [45. . . 53] 


$5,6 


contacts of residues [71. 


. . 76] with residues [82. . . 88] 


53,6 


contacts of residues [42. 


. . 52] with residues [78. . . 83] 


s 7 


contacts of residue [103] with residues [94. . . 101] 


SV,8 


contacts of residues [104,105] with residues [113,114] 


£5,8 


contact between residue 76 and residue 113 



TABLE I. List of mutations analyzed in this work. The same total destabilization of 9.21 kJ/mol 
was used in all cases, adapting the individual Ae$ » of each contact. The indices m, n in S mt n specify 
the helices involved in the mutated contacts. S3 actually affects a loop residue, close to helix 3. 

• A115G (AAG cq . = 3.39 kJ/mol C) 

• A115G/A9G 

• A115G/E17V/D20L 

• A9G/E17V/D20L/A115G 

Multiple mutations are considered as independent, and the corresponding energetic per- 
turbation is applied to each point mutation separately, so that, e.g., AAGaii5g/A9G — 
AAG A ii5G + AAG A9G . 



Thermodynamics 

The equilibrium values of all thermodynamic quantities are calculated resorting to the 
exact solution of the model^ 1 ^. In particular, we will study the fraction of native residues: 

1 N 

i=i 
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and the free-energy profiles as a function of the number of native residues M: 



F(M) = -RT\ogZ M , (4) 

where Z M = J2{ mi } ex P ( — H/RT), and the sum ^ (•) is restricted to the states with a 
fixed number of native residues ^ mj = M. The Zm can be easily calculated within the 
framework of the exact solution mentioned above. The reaction coordinate is defined as 
p = M/N. Finally, we will study the average values 

tHj = (mij) , (5) 

and 

v id = ((1 - 171^1)171^(1 - m j+1 )) , (6) 

of the products m^j = Yli=i m k- Since these products take on value or 1, fj,^ and v^j 
represent the equilibrium probability that the region between % and j is native (and, in the 
second case, that it is capped by unfolded residues, thus representing an isolated native 
region) . 

Kinetics 

The kinetic evolution of the model is described through a discrete-time master equa- 
tion, p t+ i(x) = J2 X > W(x' — > x)p t (x), for the probability distribution pt(x) at time t, where 
x = {rrik, k — 1, . . .N} denotes the state of the system. Unfortunately this expression is 
not amenable to analytical treatment (even if an accurate semi-analytical approximation 
exists^ 1 ^), since by construction W{x' — > x) is a 2 N x 2 N matrix. Here the kinetics will 
be studied by means of Monte Carlo simulations: as in previous works^^i, the transition 
matrix W is specified by a single bond flip Metropolis rule, which implies that a flip is 
accepted or rejected according to its equilibrium probability, at the temperature and de- 
naturant concentration specified for the simulation. We study the kinetics in both folding 
(T=293.15 K, c=0) and unfolding conditions (T=293.15 K, c=12): in folding simulations, 
the initial state is a random configuration extracted with the infinite temperature equilib- 
rium probability, so that the initial and final fraction of native residues are m(t = 0) = 0.12 
and m(t = oo) = 0.97 respectively, for the wild type protein (slightly different values are 
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obtained for the mutants). In unfolding simulations, the fully native state (m; = 1 for each 
i) is assumed as the initial condition, while m(t = oo) = 0.0047, for the wild type protein. 

We study the relaxation of the average fraction of native residues m(t): at each time, 
the average is formally calculated as in Eq. but with the (•) now indicating the average 
over M single molecule simulations, that is, over an ensemble of M molecules. We choose 
A/"=2000 as a reasonable tradeoff between detecting a neat signal and reducing simulation 
time. We fit m(t) with one- or two-exponential expressions, namely: 

m(t) = m eq (T, c) + Cl e" M , (7) 

or 

m{t) = m eq (T, c) + c ie - klt + c 2 e~ k2t , (8) 

where m eq (T, c) is the equilibrium value at the temperature T and denaturant concentration 
c, obtained from the thermodynamics calculations. The fitting parameters are the rates fc* 
and the corresponding amplitude q. 

We also characterize the folding pathways, by observing the average times of structure 
formation: we consider the regions hi = I = 1, ... ,8 corresponding to the eight helices 

of native myotrophin, as well as the regions R a ,p encompassing the fragment from helix a to 
/3 inclusive (that is, from residue i a where helix a begins, to residue jp where helix j3 ends). 
After defining the folding time tf as the first passage time through the state with all the 
helices formed (-Ri,s), we identify, for each single molecule simulation of the folding process, 
the stabilization time t^p of each region R a> p as the last time it turns completely native 
(thus, waiting for all the fluctuations to fade away). This choice is a natural generalization 
of that proposed in Ref.— , to the present case with many elements of secondary structure: 
notice indeed that, due to the model characteristics, the stabilization of R a ,p in the native 
conformation is a necessary and sufficient condition for the formation of contacts between 

helix a and (3 (if any), as well as between all pairs of helices k,l, with a < k < I < (3. 

( f) 

The determination of t^L for all regions allows us to determine pathways in the secondary 
structure formation, and to identify two main pathways in the folding and unfolding of 
myotrophin (see Section [TTT1 below) . 

We also record the joint probabilities that two (non-overlapping) regions are native at 
the same time, for each single-molecule simulation. This is to avoid that a wildly fluctuating 
element, with a late stabilization, induce an artificial ordering along the pathway. We have 
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observed, in any case, that the only elements in myotrophin for which strong fluctuations 
could induce a problem are the first and last helix, which on the other hand turn out to 
be unimportant for pathway determination (see Section IIIip . For the other helices, we 
observe that local fluctuations can indeed invert the order by which a region is stabilized, in 
different single-molecule runs, starting from its constituent elements. However, the difference 
in stabilization times among the latter is small, allowing to group clearly which elements 
stabilize basically altogether in the folding process. 

We do the same for the unfolding simulations: now the unfolding time t u is defined as 
the first passage time in a state with m < 0.09, and for each single molecule simulation, 

(u) 

we record the last time t a p in which each region R a} p switches from the native to unfolded 
state. 

III. RESULTS 

A. EQUILIBRIUM 

1. Myotrophin presents a multi-minima free-energy profile, yet a sigmoidal 
equilibrium signal. 

Figure [2] reports the signals for the order parameter defined in Eq. |2j as a function of 
denaturant concentration and temperature. Notice that such order parameter, at difference 
with some common experimental techniques, provides a global information on the protein 
behavior, and the sigmoidal shape of its signals (which are even sharper than the experimen- 
tal data, probably due to the model energy function, that tends to enhance cooperativity^) 
suggests a two-state interpretation. However, the analysis of the free-energy profiles as a 
function of the number of native residues M reveals four minima, in both strongly renaturing 
and denaturating conditions (Fig. [3]). 

This apparent puzzle is solved by observing that, in the wild type species, the intermediate 
minima, for most temperature or denaturant concentration, are found at a free energy higher 
than the native or unfolded minima, and are not sufficiently populated to compete with them, 
resulting in an overall sigmoidal signal for the order parameter. Actually, the analysis of the 
profiles reveals that one intermediate becomes significantly populated in a small region close 
to the transition. Accordingly, a more detailed inspection of the signal of the order parameter 
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FIG. 3. Examples of the Myotrophin free-energy profiles as a function of the fraction of native 
residues p = M/N . Panel a: renaturing conditions ('c=0). Panel b: denaturing conditions (c=12). 



(see Ref.—, Figs. S2, S3) proves that a three-state fit yields an improvement in the accuracy of 
the fit which is statistically significant, according to the F-test. Mutations perturb the signal 
of the order parameter in position and/or in shape: in general, we have seen that destabilizing 
the central region lowers the mid-transition concentration, but basically preserves the degree 
of cooperativity implicit in the sigmoidal shape, while mutations at the N-term, and even 
more at the C-term, enhance the role of the intermediates, and can even induce a plateau; 
see Figure S4 in Ref.—. 

Figure [3] reports also the behavior of three mutants: 5*1,4, £3,6, £5,8) with mutation af- 
fecting the N-term, central region and C-term respectively. It can be noticed that the 
free-energy profiles of the three mutants depart from the wild-type one at different values of 
the order parameter: while the unfolded minimum is unaffected by all mutations, indicating 
that none of the perturbed contacts is formed in the unfolded state, the destabilization of 
the central region affects weakly structured conformations as well, suggesting that this re- 
gion contributes the most to the free-energy profile at low values of the reaction coordinate. 
Moreover, the profile at intermediate value of the reaction coordinate parallels that of the 
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wild type: these configurations are evenly perturbed by the mutation. On the other hand, 
the mutation perturbing the C-term only affects the native minimum (suggesting that the 
C-terminal helix just gets stabilized in the native structure) and the second barrier, with 
effects that will be clear in Section IIIIB1 Finally, perturbing the N-term shifts a bit the 
second intermediate, but does not affect the barriers. 

2. Native strings probabilities suggest the structure of the intermediate 
minima. 

A complementary, two-dimensional picture of the free energy landscape can be found 
in Figure IU where the u^j of Eq. [6] are reported, at refolding conditions. Each point u it j 
corresponds to the probability to find a native string, starting precisely at % and ending 
at j. It is easy to identify the native spot at the bottom right corner, and the isolated 
short structures represented by the short strings close to the diagonal. In addition to those, 
five extra spots of intermediate structure can be singled out, with the three sitting at the 
corners being more pronounced: the central ones (roughly centered at (32,92) and (32,106)) 
corresponds to the first intermediate in Fig. El at p « 0.5 , while the others, displaced towards 
the N-term (the spots around (5,92) and (5,106)) or C-term (the region centered at (32,115)) 
respectively, are represented by the intermediate around p = 0.75 in Fig. [3j Interestingly, 
mutants involving contacts at the N-term or C-term present different probabilities at the 
intermediate spots, and in the regions connecting them, suggesting that also the pathways 
could be different between the different species. 

Notice though that correlations between isolated structured regions are neglected by 
construction: the fact that, e.g., strings and (k,l) (with i < j < k < I) appear with 

high probability in Fig. HI does not imply that the configurations containing native structure 
at both regions and (k,l) are especially likely. So, even if these two-dimensional 

profiles already suggest possible pathways and folding mechanisms, they do not allow a 
quantitative characterization of the kinetics, and a detailed study of the latter must be 
performed independently, as in the following section. 
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(e)5 7 (f) 

FIG. 4. Inverse native strings probabilities \jv%j (Eq. [6|), at folding conditions c = 0, for the 
wild type and two mutants. On the right: detail of the bottom right region of the maps, close to 
the native structure. Although these maps cannot be directly related to the folding kinetics, the 
qualitative folding mechanisms emerging from them agree with the quantitative data from the MC 
kinetics (see Sec. IIII 51 below). The region around position (30,92) corresponds to the formation of 
a central nucleus encompassing the two central ankyrin repeats. From there, the native state at 
the bottom right can be reached either elongating rightwards and then downwards (a j pathway) or 
downwards and then rightwards (N-term, bf pathway). On both pathways, after the formation of 
the central nucleus, there are two regions of low probability, suggesting early and late barriers on 
both pathways. It is possible to guess that thel&rner pathway will be enhanced in the £3 mutant, 
and the latter in S7. 



B. KINETICS 



1. Effective two-state behavior emerges despite pathway heterogeneity. 

Monte Carlo simulation of both folding and unfolding of an ensemble of 2000 molecules 
of the wild type species reveal a single-exponential kinetics, as can be seen in Fig. El which 
is in agreement with the results in^k This simple behavior is apparently at odds with the 
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FIG. 5. Plot of the fraction of native residues m(t) as a function of time, for the folding (a) and 
the unfolding (b) process, for the wild-type and for two mutants. The wild type protein shows 
a single-exponential behavior, also common to the majority of the mutants. The two mutants 
here are chosen to represent the two-exponential behavior, for comparison. The average value was 
calculated over 2000 molecules simulations. 

multiple minima landscape reported in the free-energy profiles: to gain some detailed insight 
on how these characteristics can be simultaneously present, we have studied the relaxation 
events of individual molecules. Some representative examples of single-molecule relaxations 
for the wild type species are reported in Fig. El for the folding and unfolding case. We 
have seen that, neglecting the ubiquitous structure fluctuations, it is possible to identify 
some precise patterns in the relaxation process: typically, the folding evolves through the 

14 



stabilization of a central nucleus of four helices (second and third ankyrin repeat), after 
several events of formation of transient structures, involving at most individual repeats. 
The formation of structure at the interface between the second and third repeats typically 
triggers the immediate stabilization of both of them (even though this might be an artefact 
of the model, that just considers interactions if they take place within a native string). 
Then, the folding proceeds either in the C-term direction or towards the N-term, and finally 
it reaches completion to the native state. The unfolding at high denaturant concentration 
proceeds in a similar, but not perfectly symmetric way: there are still two possibilities, 
depending on which end unfolds first, but then the unfolding reaches completion abruptly, 
with an almost simultaneous unfolding of both the central part and the opposite end of the 
protein. To characterize in a more quantitative way these behaviors, we have identified, for 
each single molecule trajectory, the rate limiting steps, by inspection of the time differences 
At between stabilizations of different strings t^p along the folding or unfolding pathway, and 
identification of the biggest At between consecutive stabilizations. We have also classified the 
folding (unfolding) pathways by identifying the formation (disruption) of some key strings 
that trigger the successive events towards the N or C term. In both folding and unfolding 
case, we can distinguish two pathways, that we call a x , b x where x = f,u for the folding 
and unfolding case. Pathway a x is characterized by the C-terminal part getting structured 
earlier in the folding process, and disrupting later in the unfolding one. On the contrary 
the b x pathway favors earlier stabilization of the N-terminal part in the folding process, and 
its longer persistence in the unfolding one. However, folding and unfolding pathway of the 
same kind do not coincide, so that we distinguish them with the /, u labels. The detailed 
definition is as follows: for the folding process, we find that the event triggering the a/ 
pathway is the formation of a native string encompassing helix 4 to 7 before that of a native 
string from helix 2 to 5; the opposite order characterizes the bf pathway. For the unfolding 
process, a u is characterized by the contacts between helices 6 and 7 lasting more than those 
between helix 2 and 3, while the order is reverted in b u . We have seen that with the above 
definitions, it is possible to classify clearly and uniquely all the single-molecule relaxations 
(of the wild type and of the mutated species, see below) as belonging to either pathway. 

These results are summarized in Table HI1 where the rate and amplitude for the one- 
exponential fit and the fraction of molecules through the a and b pathways are reported. We 
find a dominance of the pathway a y through the C-term over the b f , and a slight dominance 
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FIG. 6. Single-molecules simulations of folding (panels a,b) and unfolding (c,d) events. Panels a,c: 
relaxation along pathway "a"; panels b,d: relaxation along pathway "b" (see text for details). The 
residue state rrii(t) is reported with differents colors: black if unfolded {rrii = 0) and white if native 
{rrii = 1). The vertical stripes indicate the positions of the a-helices. 



of b u over a u , pointing out that there may be some differences in the topography of the 
energy landscape in folding and unfolding conditions. 

How is it possible that two different pathways are present, while the folding and unfolding 
appear as two-state processes? The difference in the fraction of molecules following either 
pathway suggests that there is a little difference in the free-energy barrier that they have to 
surmount. This difference cannot be huge, since in that case it would result in rates along 
each pathway differing by order of magnitudes, which in turn would imply fluxes by just one 
channel. Moreover, the fact that several minima, connected by different barriers, are found in 
the free-energy profiles, but the relaxation kinetics is simply exponential (two-exponential fits 
fail to produce reliable results due to overfitting, data not shown), implies that either the rate 
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h |ci| 


a b 




(io- 7 ) 


(%) (%) 


folding 


1.279(9) 0.814(4) 


80.2 19.8 


unfolding 


4.76(8) 0.965(4) 


40.2 59.8 



TABLE II. Rates and amplitudes (from 1-exponential fits) and fraction of molecules that select 
folding pathway a and b, in the case of folding and unfolding relaxations. The errors have been 
estimated by dividing the proteins in 10 groups of 200 molecules each, evaluating a rate and 
amplitude from the fit of the average signal of each group, and calculating the mean and deviation 
of the mean of the resulting population of rates and amplitudes. 

limiting step is represented by crossing the first barrier along the pathway, effectively masking 
the other jumps, or the different barriers are associated to very similar rates. This picture is 
confirmed by the analysis of the average times of helix stabilization (or destabilization, in the 
unfolding process), reported in Fig. [TJ where the most representative patterns of secondary 
structure formation are reported separately for both the folding and unfolding pathways. It 
is clear from the top panels that the folding pathways are characterized by the formation of 
helices 3, 4, 5 basically altogether, around t=6 • 10 6 , followed by the extension, in another 
million of time steps, toward helices 6 and 7 in pathway a/, or helix 2 in pathway bf. Then, 
the rest of the structure folds almost at once. In the folding process, the longest time is 
associated to the formation of the initial nucleus, and the second longest one (and close to 
the former), to the completion of the folding along pathway a/. 

The unfolding process presents as well some common schemes: in the dominant b u path- 
way, unfolding proceeds from the C-term (helices 7, 6 and 5), passes through the last and 
first helix, and finally affects the rest of the N-terminal part. The a u pathway presents more 
variability, but the dominant mechanism is given by the a/ pathway covered in the opposite 
direction, and ending with the central group of helices 3, 4 and 5. Notice that in both the 
unfolding pathways, the second repeat appears as the last to unfold, and the longest time 
is usually associated to the unfolding of the last repeat, and in particular of helix 7. 

Fig. [7] suggests a clear picture of how the folding and unfolding proceed along the a x and 
b x pathways and gives an idea of what are the rate limiting steps, even if it does not inform 
on the detailed structure of the transition states and nuclei, because the latter could contain 
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(a)pathway a/, 1604 
molecules 



(b) pathway 6/, 396 
molecules 



le+OS 2e+06 3e+06 4e+06 5e+06 le+06 2e+06 3e+06 4e+06 5e+06 




(c)pathway a u , 804 (d)pathway b u , 1196 

molecules molecules 



FIG. 7. Patterns of helices stabilization in the folding process (top) and of their disruption in 
the unfolding one (bottom), from the analysis of 2000 single molecule relaxations, both for folding 
and unfolding events. Just the five most representative patterns are reported for each folding or 
unfolding pathway. In the top panels, the horizontal axis represents (ta,a), that is, the last time 
(in units of MC sweeps) that the helix a turns completely folded in the simulation. In the bottom 
panel, the corresponding quantity for unfolding is reported. The averages are performed on 

all the molecules n s following the same succession of events s, which can be read in the labels of the 
bars; the number n s is reported at the left of the y-axis. The color code is the same for all panels, 
and refers to the order of helix stabilization (destabilization in the unfolding case). Missing colors, 
as well as grouping of the corresponding labels, indicate that a group of helices folds (or unfolds) 
almost at the same time. When this grouping takes places, it is quite common to find patterns 
that differ just for the permutation of grouped labels: e.g. this is the case for the aj pathways 
reported in panel (a), where all the patterns are small variation of the same scheme in four steps: 
folding of helices 3, 4, 5 basically at the same time, then stabilization of helix 6 and then 7 after a 
short time, and finally completion of the folding. 

partially structure helices and hence need not coincide with a collection of fully formed 
helices, while the times ta)a, t$ a inform on when the helix a becomes stably structured 
or unstructured as a whole, and are affected by structural fluctuations within the helix. 
Moreover, averaging the times ta,a^ (that may vary a lot from molecule to molecule) gives 
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no information about the time evolution of any observable. 

For these reasons, the picture coming from Fig. [7] must be checked and confirmed by 
further analysis. To this end, we have simulated the effect of mutations, as explained in 
Section [Til an d performed the same analysis as for the wild type species. 

2. Analysis of simulated mutants confirms the proposed kinetics 
mechanism, with pathway heterogeneity. 

Table II I II summarizes the results for the kinetics of the mutants: most of the times, the 
m(t) signal can adequately be fitted with just one exponential (with rate k\) both in the 
folding and unfolding case, while some mutants present a second, faster phase k 2 , especially 
in the unfolding case. 
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(10- 7 ) 
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WT 


1.29 




80.2 


19.8 


5.08 




40.2 


59.8 


<Sl,2 


1.22 




85.1 


14.9 


5.20 


32.0 


85.7 


14.3 


S3 


0.058 


1.26 


98.7 


1.3 


5.00 




39.6 


60.4 


<S3,4 


0.116 




85.9 


14.1 


6.74 




38.8 


61.2 


£5,6 


1.22 




80.2 


19.8 


4.92 




41.0 


59.0 


s 7 


1.18 




45.3 


54.7 


6.45 




29.9 


70.1 


<sv,8 


1.26 




61.7 


38.3 


6.95 


47.8 


1.7 


98.3 


51,4 


1.28 




80.5 


19.5 


5.02 


21.4 


88.2 


11.8 


£3,6 


1.25 




78.2 


21.8 


5.13 




39.1 


60.9 




1.34 




69.1 


30.9 


6.99 


50.9 


1.6 


98.4 



TABLE III. Rate and fraction of molecules that select folding pathway a and b in the case of folding 
and unfolding dinamics for differt types of perturbation, as compared to the wild type (WT). Here 
the rates are calculated by fits on the whole ensemble of 2000 proteins. Missing entries in column k 2 
mean that the 1-exponential fit was sufficient, and the 2-exponential one would produce overfitting. 
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a. Folding kinetics. We see that in the folding case, mutations affecting helices 2,3 
and 7,8 do not change the rate much, but they are those that tune the flow along the two 
pathways. Mutations at the C-term cause the biggest pathway shifts, in agreement with the 
experimental results^. Mutations in the central nucleus (e.g. S^) affect the rate , but do 
not change much the distribution along the two pathways. Mutants S^g, are associated 
to contacts that do not stay at a barrier top: their destabilization weakly affects the rates 
and pathways. 

The above results confirm that the formation of the central nucleus of three helices 3,4 
and 5 is the rate limiting step for the folding process, followed by a growth of the nucleus 
towards the C or N term (pathways a/ and bf respectively). The barriers associated to these 
pathways are smaller enough than the nucleation one, so that most of the perturbations that 
shift the flow cannot "promote" these barriers to be the highest one. As a result, the above 
sequence of events is preserved in the mutants, and the folding rates are little affected and 
quite similar to the WT one, while the distribution of the flow changes according to which 
pathway is destabilized. The behavior of is consistent with the above observations: 
here the central nucleus is destabilized, which results in a slower rate with a moderate 
change in the flows. The only exception to this picture is mutation £3, that affects all the 
contacts (local and nonlocal) of residue 32, located in the loop between the first and second 
repeat. The destabilization of these contacts leaves the rate for the formation of the central 
structure unchanged, but produces a second, slower rate (that therefore we shall interpret 
as the folding rate), corresponding to the last steps of the folding along pathway aj, of 
stabilization of the structure at the first repeat. Accordingly, the corresponding folding flow 
goes almost completely through the aj pathway. Consistent with this intepretation, 
does not involve relevant changes in either the folding rate or fluxes, and appears downhill 
with respect to the crossing of the last barrier along the a/ pathway. 

b. Unfolding kinetics. As in the folding, mutations affecting the external regions (first 
and last ankyrin repeats) cause the biggest changes in the flow through the pathways; these 
changes agree with those observed in the folding case: a mutation causing a larger flux to- 
wards the cif pathway in folding will also cause an increase in the a u flux in unfolding, though 
of different magnitude. Moreover, these mutations are accompanied by the appearence of 
a second, faster rate, signaling that a part of the structure unfolds before the rate limiting 
step; the latter rapidly leads to the completion of the unfolding. On the other hand, muta- 
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tions affecting the central region do not cause major changes in either the rate or the flux 
distribution along the two pathways, with respect to the wild type. Interestingly, in the S3, 
£3 4 and S 5 fi species the changes in the flux have opposite sign in the folding and unfolding 
processes, again suggesting that the choice of the pathway is not controlled by the central 
repeats. As Table Hill suggests, the presence of two rates, in either folding or unfolding, is 
not related to the two different pathways for the kinetics, as one could naively think at the 
beginning. This is even clearer in Table \1V\ where the single or two-exponential fits are 
performed separately on the subsets of molecules following either pathways: in general, the 
need for a two-exponential fit for the full ensemble is associated to the presence of two rates 
in either the a or the b pathway. 
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37.2 
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5*1,4 


1.25 




1.42 




5.06 


23.2 


5*3,6 


1.21 




1.37 




4.51 


5.73 


55,8 


1.30 




1.35 




49.2 


7.12 54.2 



TABLE IV. Rates calculated on the subsets of molecules following the different pathways of folding 
and unfolding, as found in Table IIII1 

Another interesting thing is that the rates along the less populated pathways are usually 
comparable to or greater than those along the corresponding dominant one. This apparent 
contradiction can be explained if one considers that, due to the restriction of the fit to a 
specific subset of the molecules, the resulting rate is not related to the equilibrium distribu- 
tion, and it gives no information on the height of the free-energy barrier along the pathway. 
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This can be easily understood by considering, for example, wild type molecules: after the 
formation of the nucleus of the two central repeats, they have to choose whether to follow 
the cif pathway, with a low barrier, or the bf one, with a higher one. Table HTT1 shows that 
the majority of them will follow the former. Therefore, the fraction of molecules along the 
bf pathway will be given by those that choose that pathway early, i.e. in times shorter than 
the typical folding time along a/, producing an apparently faster rate. This can also be seen 
in the top panels of Fig. [7J the formation of helix 2 in the bf pathway takes place in a time 
of the same magnitude as the selection of helix 6 and 7 in the a/ one. After the formation 
of helix 2, the folding along bf is faster than the competing one. Thus, a fit restricted to the 
bf ensemble yields naturally a faster rate than the one found for the df pathway. Indeed, 
we have checked that if the bf pathway is imposed to the wild type protein, by associating 
an energy penalty to the formation of long native string towards the C-term, so that the 
protein cannot "escape" through the df pathway, the resulting rate kb f = 1.20- 10~ 7 is lower 
than that observed in free wild type. 

c. Multiple mutations In Ref.—, Lowe and Itzhaki induce switches between the path- 
ways by engineering multiple mutants. In order to compare our model to those results, we 
have simulated the effect of those mutations applying a stabilizing or destabilizing pertur- 
bation, as explained in Section HT1 

The model predictions for such mutants, reported in Tables SI and S2 of Ref.—, show 
that the redistribution of the flux along the pathway upon combination of point mutations 
is qualitatively as expected from the discussion in the previous section, so that a mutation 
favouring pathway a will balance the effects of a mutation favouring pathway b, recovering 
at least partially the WT flows, even if with smaller folding and bigger unfolding rates. It 
must be noticed, though, that the ratio between the corresponding rates do not reflect the 
experimental results, and the model fails to give quantitative predictions of the rates. 

IV. DISCUSSION AND CONCLUSIONS 

The results reported show that the WSME model reproduces qualitatively the experi- 
mental behavior: indeed, we find sigmoidal, apparently two-state-like equilibrium signals, 
a two-state-like kinetics with transient intermediates, and also two pathways in kinetics, 
characterized by the order of structure formation at the C- and the N-term. Moreover, we 
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find that precisely targeted perturbations of the contact interactions at different positions 
along the chain allow to induce pathway switches, as seen in experiments, or the stabilization 
of some transient intermediate resulting in a faster phase, and provide a way to probe the 
folding mechanisms to a great detail. The intrinsic complexity of the free-energy landscape 
of the protein myotrophin is evident already from the observation that the sigmoidal signals 
actually hides a three-state equilibrium (see Ref.— ), and from the analysis of the free-energy 
profiles, Fig. [3j For this protein the latter, 1-dimensional, projection is not sufficient to sug- 
gest the details of the kinetics, since the two different pathways cannot be distinguished just 
on the basis of the number of native residues, which is the natural reaction coordinate of the 
model. It is also important to notice that the free-energy profiles, in strongly denaturing or 
renaturing conditions, present small barriers (less than 3 RT), which seems at odds with the 
much slower rates found in the simulations. We see three possible reasons for such difference: 
first, in the presence of two different pathways in the configuration space, but with barriers 
located at similar values of the reaction coordinate and thus roughly overlapping in the pro- 
jection (see Fig. H] and the relative discussion), the profile will be always more representative 
of the lower of the two overlapping barriers, since, by construction, it is more representative 
of the states with higher Boltzmann weight. However, if, as in this case, the lower early 
barrier is on the a pathway and the lower late barrier is on the b pathway, the true barrier on 
each pathway will be higher than it may be inferred from the profiles. Second, the presence 
of wide and almost flat regions between each minima and the following barrier slows down 
the rate, according to the role of the curvature of the profile in Kramers' theory. Third, the 
projection collects together, at the same values of the reaction coordinate, configurations 
that can be highly different (in terms of the Hamming distance), especially in the unfolded 
region. This is irrelevant to kinetics as long as the motions in the transverse directions are 
fast enough to be relaxed at equilibrium when considering displacements along the reaction 
coordinate, but this is not necessarily granted for proteins with pathway heterogeneity. In- 
dependently from which of the above possibilities is more relevant, the important message 
from the above observations is that any quantitative conclusion about kinetics, derived from 
the analysis of the free-energy profiles, must be drawn with care, especially for proteins with 
a pathway heterogeneity. Yet, some hints for such a read-out of the kinetics come from 
the analysis of the equilibrium probability for the formation of native strings, Fig. HI even 
if such probabilities do not constitute a free-energy map, and cannot be used to predict 
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the kinetics by transition state theory or by solving a diffusion equation, they suggest the 
important spots playing a key role in the intermediate states, the possible folding pathways, 
and the way mutations can affect the kinetics, redirecting the folding and unfolding fluxes. 

The picture that emerges, and that could probably be generalized to other repeat pro- 
teins, is that in such proteins multi-minima free-energy profiles are the rule, with intermedi- 
ate states related to the completion of the folding of whole repeats or substructures of them. 
In this framework, the cooperativity in the equilibrium unfolding reported in experiments 
would be attained by a "designed" free-energy landscape, such that in all conditions the 
intermediates are associated to free-energies substantially higher than those of the native 
and unfolded states. Such design would involve sequence-heterogeneity between the differ- 
ent repeats, to ensure different degrees of stability to different partially folded structures. 
Mutations can alter this situation^, and indeed we see that cooperativity is reduced by per- 
turbing the N-term, and especially the C-term of Myotrophin. Two-state kinetics would 
most likely emerge, in such a multi-minima landscape, when the rate-limiting step coincides 
with the crossing of the first barrier encountered in the folding or unfolding process, and 
masks the crossing of the following barriers. Again, mutations may "promote" different bar- 
riers to the status of rate-limiting step, thus involving multi-state kinetics and/or pathway 
heterogeneity. 

Unfortunately, the predictions of the model, especially for the kinetics, cannot be made 
quantitative, at least at this level of simplicity. In the model we adjust only two parameters 
to reproduce the temperature and concentration of the folding midpoint, and at this level 
of simplification, we cannot reproduce the ratio of the rates between the mutants studied in 
the experiments. Also, the central nucleus that we find in the folding process is not reported 
in the intepretation of the experimental data by Lowe and Itzhaki (that propose a three- 
state dominant pathway plus a two-state secondary one to interpret their data, and assume 
that the total relaxation rate is just the sum of the rate along the two pathways). The 
enhancement of the role of such a common central nucleus, that delays the choice of either 
folding pathway to higher values of the reaction coordinate, might therefore be a model 
artifact, due to the model feature of considering the interaction energies as just proportional 
to the number of contacts of each residue: this may effectively penalize the terminal helices, 
that make fewer contacts. 

However, it is important to stress that the model gives predictions which go beyond the 
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experimental results, as for instance the detailed information about the pathways, providing 
useful conceptual frameworks for the intepretation of the experimental data. An important 
suggestion coming from the model predictions is that folding and unfolding pathways are 
not necessarily the same pathway, covered in opposite directions: the different denaturant 
concentrations (or temperature conditions) may involve subtle but important changes in the 
energy landscape, so that the overall mechanisms (for instance, the two-state, two-pathway 
kinetics) do not change, but the details of the pathways do. Indeed, in the presence of two 
possible pathways and a strong bias towards folding (or unfolding), it is most likely that at 
any "fork" in the pathway the protein will follow the trail with the smaller barrier at that 
point, and will be stuck on it, since backwards jumps will be strongly suppressed. In the 
present case, the lower folding barrier at lower values of the reaction coordinates on the a/ 
pathway, and the lower unfolding barrier at higher values of the reaction coordinates on the 
b u pathway, produce heterogeneity in the folding and unfolding pathways. Independently on 
how realistically this mechanism describes the behavior of myotrophin, it is an important 
warning for the design of simple interpretation frameworks for experimental results: if on the 
one hand, a phenomenological model must be kept as simple as possible, to avoid overfitting 
and the introduction of too many parameters, on the other hand, the use of simple models 
as the present one, with very few free parameters, may represent a useful alternative to get 
a grasp on the key mechanisms of the folding and unfolding process. 
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